Open Science in Water Resources with the Jupyter Notebooks.






With the ever increasing availability of data and ever more complex computational models, the computational and coding needs of water resources professionals is greater now than ever. However, water resources community currently has no ‘lingua franca’for scientific computing and coding is being done in a wide variety of specialized environments computer languages. This has created a situation where the complexity of these computing environments is inhibiting many from being able to easily share their code with colleagues across various countries and organizations. The open source community is addressing this issue by developing innovative programming tools and environments which allow for a universal interface for computing in a variety of languages. Jupyter Notebooks, developed by the Jupyter Project, enable the combination of rich multimedia documentation and interactive code within a standard web browser interface, they facilitate the setup and sharing of customizable computing environments in a variety of programing languages, and eliminate the need for expensive licensing in many scientific computing applications. This presentation introduces Jupyter Notebooks as applied to a variety of problems in water resources engineering. Examples notebooks include applications for coastal and inland water resources projects as well as statistical and data analysis tools.

What do we mean by Open Science?



...



Open Data and Open Tools!







Open Data Sources

NOAA



USGS













By Open Tools we really mean Free Open-Source Software







"Free and open-source software (FOSS) is computer software that can be classified as both free software and open-source software. That is, anyone is freely licensed to use, copy, study, and change the software in any way, and the source code is openly shared so that people are encouraged to voluntarily improve the design of the software." - Wikipedia

Why is FOSS important?

Software that is both "free as in beer" and "free as in speach" gives users of computing hardware (read 'almost everyone!') freedom of choice in the software they use and how they use their system. This freedom has allowed for the rapid development of technologies, such as the internet and more recently the internet of things, that allow us to interact with computers in all of our lives on a day to day basis.

In [ ]:
 





The glue that makes this possible ... Python!










Python is a highly extensible, general purpose, programming language. It was originally developed in 1991 by Guido van Rossum and is now maintained as an open source project by the Python Software Foundation.

Senor Guido



The Python design ethos is focused on simplicity, consistency, and readibility. This is summarized in the "Zen of Python"

In [4]:
import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!



Because of the focus on simplicity, consistency, and readability Python is increadibly easy to learn for both begining programmers and old hands alike.

(another Python easter egg)

$ import antigravity

Python

*'print "hello world!"' is Python 2 parlance, in Python 3 this is 'print("hello world!")'.

In [5]:
print("hello world!")
hello world!

Where is Python used?

Simple answer - nearly everywhere.



Python is a popular programming language used primarily for web development, education, and scientific computing. However, due to its popularity, flexibility, and gentle learning curve there are Python modules for almost anything. (And more new ones everyday)

Think of Python as glue. It gets everywhere and holds things together.

Everyone here should know that it's also built into ArcGIS. But did you know it's part of QGIS as well?



Python is popular!


In [ ]:
 

So, how can water professionals use Python for science at work?

The SciPy Stack!



The SciPy Stack consists of 6* modules that extend the capabilities of Python into the realms of Matlab, R, and SAS.

These include the following:

  • SciPy -------> scientific computing in python including signal processing and optimization
  • Numpy -------> basic array manipulation
  • Pandas ------> data manipulation, R-style dataframes
  • SymPy -------> Symbolic math ala Mathematica/Maple
  • Matplotlib ---> visualization and plotting
  • scikit-learn--> machine learning (not strictly part of the SciPy stack)
  • IPython -----> write and run python code interactively in a shell or a notebook
    -- Note: the notebook functionality of IPython was recently spun off to Jupyter.

What is Jupyter?

... This is Jupyter!



Project Jupyter was born out of the IPython Project in 2014 as it evolved to support interactive data science and scientific computing across all programming languages.









The name Jupyter is a portmanteau of the julia, Python, and R programming languages.

However, rapid development in the past two years means that the notebook now supports more that 60 language kernels! Including: Matlab, R, Ruby, JavaScript, Bash, and C++.


The Jupyter Notebook is a multi-language web-based IDE that integrates the multimedia capabilities of the modern web, the elegant typography and equation editing capabilities of LaTeX, and the powerful computing capabilities of numerous popular programming languages within a single interface.





So what makes Jupyter special?

1. Cool cell magicks!!!

Built in documentation

In [6]:
%quickref

Shell commands

In [7]:
%ls
In [25]:
%mkdir example
%cd example
%ls
/home/rannikko/git/Dewberry-RSG/AWRA_Presentation_2017/example
In [26]:
%cd ..
%rmdir example
#%ls
/home/rannikko/git/Dewberry-RSG/AWRA_Presentation_2017
In [11]:
%%bash 
for i in `seq 1 10`;
do
        echo $i
done
1
2
3
4
5
6
7
8
9
10

Quickly and easily profile your code

In [13]:
import numpy as np
In [14]:
%timeit np.linalg.eigvals(np.random.rand(100,100))
The slowest run took 5.00 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 4.96 ms per loop
In [15]:
%%timeit a = np.random.rand(100, 100)
np.linalg.eigvals(a)
100 loops, best of 3: 4.82 ms per loop

2. LaTeX

You can pass raw LaTeX test as a string to the Math object:

In [16]:
from IPython.display import Math
Math(r'F(k) = \int_{-\infty}^{\infty} f(x) e^{2\pi i k} dx')
Out[16]:
$$F(k) = \int_{-\infty}^{\infty} f(x) e^{2\pi i k} dx$$

Or call LaTeX with a cell magic

In [17]:
%%latex 
$G=\frac{m_{1}*m_{2}}{d^{2}}$
$G=\frac{m_{1}*m_{2}}{d^{2}}$

3. HTML wizardry

Audio

In [27]:
from IPython.display import Audio
Audio(filename="sweeter_than_wine.m4a")
Out[27]:
In [28]:
Audio(filename="Kalimba.mp3")
Out[28]:

4. Documentation


Can export notebooks to PDF, HTML, .py files, etc...

Interactive widgets

Link to Interactively fitting distributions





























How do I get Jupyter and the SciPy Stack?



These packages may be installed manually or through the Python package manager "Pip". However, the easiest way by far to get a working Python distribution up and running is the Anaconda Python distro. Follow the link by clicking the Jupyter icon or check out the tutorial videos below to get started.

In [29]:
from IPython.display import YouTubeVideo
YouTubeVideo("YJC6ldI3hWk", width=900,height=600)
Out[29]:
In [30]:
from IPython.display import YouTubeVideo
YouTubeVideo("HW29067qVWk", width=900,height=600)
Out[30]:
In [ ]: